^OJP DATE 28/il/9l K i<)MBF.R PCT/GB96/U0700 I iHIM «■ Mil IHU UUI IBU UHI IWIB WUB im IBM 

AU96S1169 



(PCT) 



(SU international Patent Classification 6 : 
GOIW 1/10, G09B 29/00 


Al 


(11) International Publication Number: WO 96/296 1 9 
(43) International Publication Pate: 2* September 1996 (26.CW.96) 


(21) International Application Number: PCT/GB 96/00700 

(22) International Filing Date: IS. March 19% (18.03.96) 

(30) Priority Data: 

9505387.2 17 March 1995 (17.03.95) GB 


(81) Donated States: AL. A.M. AT. AU. AZ. BB. BG. BR. BY. 
CA. CH. CN, CZ. DE. DK. EE. ES, FI. GB. GE. HU. IS. 
JP. KE, KG, KP. KR. KZ. LK, LR. LS. LT. LU. LV. MD. 
MG. MK, MN. MW. MX. NO. NZ, PL. PT. RO, RU. SD. 
SE, SG. SI. SK. TJ TM, TR. TT. UA. UG.' US. UZ. VN. 
ARIPO patent (KE. LS. NfW, SD, SZ. UG). Eurasian patent 
(AM, AZ. BY, KG. KZ. MD. RU. TJ. TM). European patent 



(71) / Micnnt (for all designated States except US): UNIVERSITY 

OF LEEDS [GB/GB]; I «ds. West Yorkshire LS2 8JT (GB). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): LENNON. Jack [GB/GB]; 
17. Wellhouse Avenue. Leeds, West Yorkshire LS8 4BY 
(GB). TURNER, John, Richard, George [GB/GB]; 58 Ash 
Hill Drive, Leeds. West Yorkshire LSP 8JP (GB). 

(74) Agent: STANLEY, David, William; The Innovation Centre, 
University Road. Hcslington, York Y01 5DG (GB). 



(AT. BE, CH. DE. DK. ES. FI. FR. GB. GR. IE. IT. LU. 
MC. NL. PT. SE). OAPI patent (BF. BJ. CF, CG, CI. CM. 
GA. GN. ML, MR NE. SN. TD. TG). 



Published 

With international search report. 



(54) Title: MAPS 

(57) Abstract 

Apparatus for drawing a contour map 
of a spatial area (e.g. a country) comprises 
means (10, 20) for recording a set of first 
drta values (e.g. temperature), each measured 
at a respective predetermined point in the 
spatial area. Means (40, 50) is also provided 
for recording, for each of the predetermined 
points, various sets of further data values (e.g. 
latitude, longitude, altitude). Means (70) for 
fitting a mixed spline-regression model to the 
set of first data values is further provided, 
using as a spline variable at least one set 
of a plurality of sets of the further data 
values that have been selected b> a multiple 
regression analysis as predictive of the set of 
first data values, and using the others of the 
selected sets of further values as covariates. 
Further means (100) predicts, from the model, 
values of the first data at a plurality of points 
in the spatial area; and there is provided 
means (120), using the predicted values, for 
drawing a map of the spatial area, in which 
the predicted values are depicted. Using 
such apparatus, contour maps may readily 
be drawn for all points, predictive to a high 
degree of accuracy of a desired variable (e.g, 
temperature) which has been measured only 
at some points. 
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MAPS 



This invention relates to the production of maps and is concerned 
particularly, although not exclusively, with the production of maps showing 
spatial distributions of climate - for example, the spatial distribution oi 
temperature over a geographical area. 



The possibility of imminent climatic change has focused attention on 
the climate and its effects: it has become highly desirable to describe and 

10 predict climates. Many aspects of the distribution and functioning of" 
organisms, with important implications for both ecology and agriculture, 
may be strongly influenced by climatic factors. In this context, it may be 
wished to estimate the climate from a place or area where there is no 
meteorological data. Preferred embodiments of the present invention are 

15 concerned with the prediction of mean monthly temperature from place to 
place, given that, inevitably, there is a limit to the number of climate 
recording stations at which temperature and other climatic variables mav be 
recorded. 

LJLS 

20 There has been doubt and controversy over the best method for <^ 

producing a map of a complete climatic surface from a limited set of 
stations. Attempts have been made to use the standard statistical procedure 
of multiple regression to predict assorted climatic variables including c/~> 
temperature trom a set of topographic and location variables, but these have 

25 been criticised and have found little favour in Britain. A more generallv 
used method involves some son of interpolation to fit a surface between 



PCr/GB96/()07(l(l 



- 2 - 

recorded points; traditionally this has consisted of drawing isoplcths on a 
map by hand. 

Recently, workers in Australia and New Zealand have used a thin 
5 plate spline method of Wahba G and Wendelberger J (1980, Some new 
mathematical methods for variational objective analysis using splines and cross 
validation, Monthly Weather Review 108, 1122-1143) to predict the spatial 
distribution of temperature and rainfall, and have extensively applied the 
calculated surfaces to explanations of individual species distributions - for 

1C example, Hutchinson MF (1989, A neu- objective method for spatial 
interpolation of meteorological variables from irregular networks applied to the 
estimation of monthly mean solar radiation, temperature, precipitation and 
windrun. In Fuzpatrick EA and Kalma JD Need for climatic and bydrological 
data in agriculture in Southeast Asia. Proceedings of the United Nations 

15 University Workshop, December 1983. CSIRO Canberra, Division of Water 
Resources Technical Memorandum 89/5, 95-104). 



The National Environment Research Council recently opened the 
extensive TIGER IV programme (Terrestrial Initiative in Global 
Environmental Change) to investigate a wide range of questions as to how 
climate is related to ecology and conservation. The work of the present 
inventors on the climatic relations of the British fauna (Turner JRG, 
Gatehouse CM and Corey CA (1987, Does solar energy control organic 
diversity? Butterflies, moths and the British climate, Oikos 48, 195-203); Turner 
JRG, Lennon JJ and Lawrenson J A (1988, British bird species distributions and 
the energy theory, Nature 335, 539-541)}, and the proliferation of such projects 
under TIGER IV, has rendered urgent the construction of a set of maps of 
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the British climate. Maps on a suitable fine scale are generally not available. 
The officially published maps show monthly temperature by hand- 
interpolated isopleths [Meteorological Office, 1975, Maps of mean and extreme 
temperature over the United Kingdom 1941-1970, HMSO, London): prediction 
at intermediate points has to be by guesswork. In addition, the temperature 
maps are corrected to sea level, which is meteorologically convenient because 
removing the overwhelming effect of altitude show; the underlying trends. 
Unfortunately, this renders the maps ecologically meaningless. I,. deed a 
surprising number of ecologies have failed to realise that published 
temperature maps are drawn at sea level, and have consequently tried to 
match species distributions against these fictional isopleths. with questionable 
results. Overlays of temperature in wild-life atlases are likewise always 
fictional sea-level maps. Plots of raw station data as colour-coded discs on 
a map are excellent for objectivity but very hard to interpret. 

True temperature maps are indeed difficult to draw because of the 
need to superimpose an altitude map on a sea-level isotherm map. In 
addition, because the ;ougher surface at real altitude can be interpolated 
much less reliably than the smoother temperature surface at sea level, it is 
not desirable to attempt interpolation between unadjusted station readings. 
Effectively, such a routine would be attempting to predict the topographic 
surface of the country from a thoroughly inadequate set of points. Only 
comparatively recently, fine grained digital terrain models PTMs) have 
become available. Such DTMs give a detailed altitude map of a geographical 
area, may be produced by satellite sun-eying techniques, and are typically 
available as electronically stored digital data. The more detailed altitude data 
enables the effect of altitude on climatic maps to be better taken into 



account. However, real climatic conditions are clearly affected by more 
than simple altitude alone, so a problem still remains in predicting such 
climatic conditions with accuracy. 

✓ 

According to one aspect of the present invention, there is provided 
apparatus for drawing a map of a spatial area, comprising: 

means for recording a set of first data values, each measured at a 
respective one of a plurality of predetermined points in said area; 

means for recording a plurality of sets of further data values, each data 
value pertaining to a respective one of said predetermined points; 

means for fitting a mixed spline-regression model to the set of first 
data values, using as a spline variable at least one set of a plurality of sets of 
further data values that have been selected by a multiple regression analysis 
as predictive of said set of first data values, and using the others of said 
selected sets of further values as covariates; 

means for predicting, from said model, values of said first data at a 
plurality of points in said spatial area; and 

means, using said predicted values, for drawing a map of the spatial 
area, in which said predicted values are depicted. 
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According to another aspect of the present invention, there is 
provided a method of drawing a map of a spatial area, comprising the steps 



of: 



(a) recording a set of first data values, each measured at a respective 
one of a plurality of predetermined points in said area; 

(b) recording a plurality of sets of further data values, each data value 
pertaining to a respective one of said predetermined points; 

(c) using those of said sets of further data values that have been 
selected by a multiple regression analysis as predictive of said set of first data 
values; 

(d) fitting a mixed spline-regression model to the set of first data 
values, using at least one of said selected sets of further values as a spline 
variable, and using the others of said selected sets of further values as 
covariates; 



20 (e) predicting, from said model, values of said first data at a pluralitv 

of points in said spatial area; and 

(0 using said predicted values to draw a map of the spatial area, in 
which said predicted values are depicted. 

25 

A method as above may include a step (cl) of performing a multiple 
regression analysis on all of said sets of data values thereby to select those 
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of said plurality of sets of further data values as predictive of said set of first 
data values. 

Preferably, in said step (cl) of performing a multiple regression 
5 analysis, at least one of said sets of further data values is rejected as non- 
predictive of said set of first data values. 

Preferably, two of said sets of further data values comprise longitude 
and latitude values respectively and, in said step (d) of fitting a mixed spline- 
10 regression model to the set of first data values, only said latitude and 
longitude values are selected as spline variables. 

Preferably, at least one of said sets of further data values is selected 
from the group comprising: 



15 



20 



25 



the altitude of each of said predetermined points above sea level; 
the shortest distance from each of said predetermined points to the 



sea; 



the maximum altitude to the east of each of said predetermined points 
in a ± 25 km north-south band; and 

the variables listed in Table 1 below. 

Preferably, the values of at least one of said sets of further values is 
measured or derived from a Digital Terrain Model (DTM). 



Said first data values may be recorded over a first predetermined time 
period, and said map drawn to depict the values of said first data in a second 
time period. 

Said first predetermined time period may comprise at least one full 
calendar vear. 

Said second time period may represent a predetermined time of year. 

In the above, the term "map" may include a printed or displayed map, 
and/or a set or file of data from which a map may be constructed, and/or 
a look- up table of data from which various different maps may be derived. 

For a better understanding of the invention, and to show how 
embodiments of the same may be carried into effect, reference will now be 
made, by wav of example, to the accompanying diagrammatic drawings, in 
which: 

Figure 1A is an outline map of Great Britain, showing the locations 
of "litteu" climate stations from which climatic data was used to derive a 
temperature map of the whole of Great Britain: 

Figure IB is an outline map of Great Britain, showing the locations 
ol "check" climate stations which were used to check the accuracy of the 
temperature map derived from the fitted climate stations of Figure 1A; 
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Figure 2 is a graph to show "goodness of fit" (>~) plotted against a 
percentage of fitted stations removed from a map drawing method; 

Figure 3 is a temperature map, derived by one example of a method 
in accordance with the present invention, to show the average January 
temperature for the whole of Great Britain, to a resolution of 5 km; 

Figure 4 is a temperature map similar to that of Figure 3, but showing 
variation between actual and predicted values; 

Figure 5 is a map of the whole of Great Britain, derived by an 
example of a method in accordance with the present invention, to show 
growing seasons; 

Figure 6 is a map of the whole of Great Britain, derived by an 
example of a method in accordance with the present invention, to show 
continentality; and 

Figure 7 is a block schematic diagram of one example of apparatus 
embodying the invention, for drawing a map. 

There now follows an example of the present invention, in which a 
temperature map of the whole of Great Britain is produced, to a 5 km 
resolution, from temperature data obtained from a relatively small number 
of scattered climate stations (weather stations), and from terrain data derived 
from a DTM of Great Britain. As will be shown, the accuracy of the 
temperature data of the map is better than 95%. 
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It is to be appreciated that the present invention is applicable to the 
production of maps other than temperature maps (e.g. rainfall maps, 
population maps), to the production of maps of areas other than Great 
Britain or other geographical areas (e.g. any spatial area of interest): and to 
maps at greater or lesser resolutions (e.g. with respect to Great Britain, 
accurate temperature maps down to a resolution of a few hundred metres 
may be possible). However, the following example of a temperature map 
oi Great Britain is a useful one. and has been carried out experimentally. 



In the following example, we have used a method which we have 
tound to be an excellent predictor of temperature in Great Britain, and used 
it with a DTM to generate monthly, seasonal and other maps. We have 
used 30-year averages of monthly temperature from the national climate 
recording stations for the period 1941-1970, which is the last period for 
which data has been published {Meteorological Office, 1976, Averages of 
temperature for the United Kingdom 1941-1970, HMSO, London), and we have 
prepared the maps on a 5 km grid. The more remote outlying islands 
(Orkney, Shetland, Isle of Man, Channel Isles, Scilly) were excluded because 
we were unable to obtain gridded topographic pTM) data for use in 
20 regression analyses, as was Northern Ireland. This gave a total of 206 
recording stations. 

Briefly, we used the following procedure: 



25 



1. The stations were divided into two groups, designated as the "fitted 
group (Figure 1A) and the "check" group (Figure IB); 
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A model was built using the data from the fitted stations; 



3. The goodness of fit of this model to the data from the check stations 
was calculated; 

4. Data from all stations (that is the fitted and check stations combined) 
was then used to predict the climatic surface lor the whole of Britain; and 

5. Climatic maps were generated from this surface. 



ure 
en 



This procedure was performed to map a mean monthly temperat 
lor each month of the year independently; seasonal and other data was th 
calculated by averaging, totalling or otherwise manipulating the monthly 
predictions. 

The complete set of stations was divided into the two groups bv first 
calculating a clustering index for each station (the sum of the reciprocal of 
the squared distances from all other stations C. = d,; : for the /th station, 
where ; sums the inter-station distance d across all remaining stations). The 
station with the maximum of this parameter was then removed and used as 
tne foundation member of a new subset of stations: thereafter the clustering 
index of each station remaining within the first subset was calculated within 
each of the two subsets; the ratio of the two clustering indices was then 
calculated and the station with the largest value of this ratio was moved to 
the new subset. Application of this procedure to all stations in turn, 
produced two grids of stations with veiy similar overall geographical 
distributions: one of these was arbitrarily designated as the fitted group, and 
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the other as the check group. The distributions of the fitted and check 
stations are shown in Figures 1A and IB. 

The thin plate spline model of Wahba and Wendelberger (1980). 
developed for climate surfaces by Hutchinson (1989). fits a non-linear 
flexible //-dimensional surface to data points. The final surface is chosen bv 
a routine which virtually removes each data point in turn, and estimates the 
deviation of its real value from the value predicted by the surfaa . Tin- 
surface finally chosen is the one which achieves an optimum compromise 
between minimising the mean square deviation summed over all data points, 
and the overall roughness of the estimated surface. The surface can be fitted 
to latitude and longitude only, or may be fitted as a hyperdimensional 
surface to further variables such as altitude and distance from the ser Spline 
surfaces may be calculated using M.F. Hutchinson's ANUSPLIN computer 
package. 

In our work, we constructed a thin plate spline model from data 
which had been selected bv a multiple regression method, and the resulting 
model is termed a mixed spline-regression model. 

In order to construct regression models, we used a total of eighteen 
geographical and topographic variables, listed with abbreviations in Table 1 
below, as a set of independent variables. The data was based on the national 
grid, sampled in 5 km squares from a DTM with points spaced at 500m. 
Within each 5 km square, the contained grid of 100 points was used to 
derive the mean, maximum and minimum altitude within the square 
(ALTME. ALTMA, ALTMI). the standard deviation of altitude within the 



square (Al.TSD). and the percentage of the square which is not sea surface 
(PL AND). 



Table 1 


Abbreviation 


Terrain variables derived from the digital terrain model 


EAST 


Longitude, as 4-figure Ordnan-re Survey grid reference 
(eastines) 


NORTH 


Latitude, as 4-figure Ordnance Survey grid reference 
(nort hmgs: 


ALT 


Altitude u\ meteorological station (meters) 


ALTME 
(+ 1)* 


i ; — — 

Mean altitude ot ICC points in 5 km square 


ALTMA 


Maximum altitude of ICC points in 5 km square 


ALTMI 


Minimum altitude of ICC points in 5 km square 


ALTSD 


Standard deviation of altitude of ICC points in 5 km square 


PLAND 


Percentage of 5 km square that is land surface 


ALTMEB 
(+1)* 


Mean altitude of 8 ICC points in 45 km square 


ALTMAB 


Maximum altitude of SlCC points in 45 km square 


ALTMIB 
(+1)" 


Minimum altitude of SlCC points in 45 km square 
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ALTSDB 
(+1)* 


Standard -'aviation of altitude of 8 ICC points in 45 km 
square 


PLANDB 


Percentage of 15 km square that is land surface 


ALTWM 


The maximum aititude to the west in a ± 25 km north- 
south band 


ALTEM 


The maximum altitude to the east in a ± 25 km north-south 
band 


SLOS ( + 5)" 


The large-scale slope of the land surface to the south 
(degrees) 


SLOW (+5)' : - 


The large-scale slope of the land surface to the west (degrees)" 


DIST 


Shortest distance to the sea (km) 


* for the log-linear regressions this value was added to each of the values to | 
avoid problems with InO. j 



2C 



Additional variables were calculated for 45 km squares of the national 
grid (each consisting of 81 of the 5 km squares), by computing within each 
45 km square the mean altitude, the average maximum altitude (mean of the 
5 km maxima), the average minimum altitude (similarly), and the standard 
deviation within the 45 km square (root mean square deviation of the 8100 
reference points within the square), and the percentage of the large square 
which is not sea-surface (names of variables as before, with the addition of 
'B'). This set of variables allowed us to include coarse scale topographic 
information in the models. 
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The maximum land-height to both the east and west (ALTEM, 
ALTWM) of each 5 km square was found by searching the matrix of 
altitudes in a band 50 km wide (25 km north and 25 km south) extending 
east or west of the square as appropriate. These variables are intended to 
allow for possible effects of distant high ground, for example the formation 
of orographic cloud. The large scale slope of the ground in the vicinity of 
each 5 km square (SLOS, SLOW) was found as the difference in mean 
altitude between the reference square and the immediately adjacent 5 km 
square to the south or to the west. These slopes are small (at most ± one 
degree of arc) since the bas. line of 5 km is long in relation to the typical 
mean altitude. 

The distance from the sea piST) was calculated from a digitised 
coastline. The proportion of the square (5 km or 45 km) (PL AND, 
PLANDB) that is land surface expresses, besides proximity to the sea, an 
abstraction of the shape of the coastline around that loca^on ; It was 
computed as the percentage of points at zero altitude lying outside the 
coastline (to exclude land surface at or below sea level). 

In the derivation and verification of regression models, the actual grid 
east and grid nonh positions (to six figures) (EAST, NORTH) of the climate 
recording stations were used as variables. For other purposes, for example 
the construction of maps from mixed-spline regression models^S :; grid ^ 
coordinates are those at the centre of the appropriate 5 km squa^^iili|:^ 
the altitude used in deriving and verifying regressionSff^^^^^l 
regression models was the true height of the station (AUjfm^^^^^ 
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of the regression and spline models this station height was replaced with the 
mean altitude of the 5 km square (ALTME). 

The above variables were used in multiple regression equations of the 

form: 

Tm = a + b lXl + b 2 x 2 + + b n x n , 

where Tm = average monthly temperature, for a given month, 

and averaged over the sample period (thirty 
years); 

a = a constant; 

bj .... b n - regression coefficients 

x l x n ~ variables 

In the following, the variables x, .... x n are the variables listed in 
Table 1, and explained above. The regression coefficients b, .... b n vary for 
each month of the year, and will have a positive, negative or zero (or 
minimal) value. The following Table 2 gives the sign of the regression 
coefficients b l .... b n for the twelve months of the year. 

Table 2 ^ , ; 

MONTH E N A D P~A~ATTTTTTTT~k^'*S 
AOLILLLLLLLLLLi L -I \ 
SRTSATTTTTTAT T f Wotn 



T T TNMMMSWENMMM S s4 
H DEAIDMMDEAI D 

B B B B B 
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10 



15 



20 



MAR 
APR 
MAY 
JUN 
JUL 
AUG 
SEP 
OCT 
NOV 
DEC 



+ + + 



+ + + 



+ + 



+ + + 
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vanab e inc uded in model with positive regression coefficient 
variable included in model with negative regression coefficient 
variable not included in model (coefficient set at zero) 



30 



35 



40 



Within each set of variables the best predictive model in each month 
was chosen by using Statistical Analysis System routine PROC RSQUARE 
(SAS 1985). This system runs multiple regressions for all possible 
permutations of the independent variables and estimates the model with the 
largest r 2 for a given sized subset of independent variables. Thus for each 
month eighteen models were found, each model being the one with the 
largest r 1 out of the set of all models with the specified number of variables 
(between 1 and 18). For each ; rr,nth the model from this set of eighteen 
with the highest adjusted r 2 vas selected; this in effect removed models 
which include variables that do not increase predictive power. Similar 
model selection procedures such as JP (Judge GG, Griffiths WE, Hill RC, Lee 
Z 1980, The theory and practise of econometrics, Wiley, New York), PC 
(Amemiya T, 1976, Selection of regressors, Technical Report # 225, Stanford 
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University, Stanford, California) and Cp (Mallows CL 1964 Some comments 
on Cp, Technometncs 15, 661-675) were in strong agreement over which was 
the best model. This procedure was considered superior to stepwise 
elimination in which variables which are not individually significant can be 
dropped, since our procedure was concerned with the predictive ability of 
the model rather than with testing hypotheses about causal relationships 
between the dependent and independent variables (Hocking RR 1976 The 
analysis and selection of variables in linear regression, Biometrics 32, 1-49). 

In a mixed spline-regression model, the surface is fitted to some of the 
variables by the thin plate spline routine described above, but only after the 
surface has been corrected to its expected values derived from multiple 
regression on a further suite of variables (termed for this purpose covariates). 
Operationally, the parameters of regression on the covariates are computed, 
and the expected value of each data point is determined from the regression 
equation. These expected values are then treated as raw data to which the 
spline surface is fitted. In the present example we have treated a wide suite 
of variables derived from the DTM as covariates, leaving only latitude and 
longitude to be fitted by the thin plate spline. The covariates were selected 
from the DTM using multiple regression techniques. 

There is a strong clustering of climate recording stations in the south 
of England, particularly in the London area. To examine the effect of such 
clustering on the regression models, stations in clusters were removed before 
the partition into the fitted and check datasets. Stations were removed from 
clusters in order of decreasing values of C, (the sum of the reciprocal squared 
distances to the remaining stations - see above). The best predictive method 
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found (sLLr9 - Table 3) was then applied to the thinned out station grid; 
repetition of this procedure with an increasing percentage of stations 
removed allowed us to estimate the effect of clustering on the goodness of 
fit. 



The accuracy of the fitted surfaces (i.e. fitted to the "fitted" stations) 
was estimated from the goodness of fit of the values which they predicted 
for the set of check stations compared with the actual meteorological 
readings at those stations. Fit was estimated as the coefficient of 
determination (r 2 ) of the correlation (r) between the observed and predicted 
values. 



The t 2 for goodness of fit between the surface predicted from the 
fitted stations and the actual temperature values at the check stations are 
shown in Table 3 below. For ease of reference, regression methods are 
denoted by r, splines by s, with the variables following the letter: thus 
sLLrA denotes a spline surface fitted to latitude and longitude, with 
regression on altitude as a covariate. D denotes distance from the sea, and 
numerous variables are denoted by their number: rl8 indicates regression on 
eighteen variables, with suffix f for the full set and s if a subset has been 
selected. 
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Aoorev Met no: j an Feo Mar Aor May Jun Jul Aug Seo Oct Nov 2e: Avera=r 
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In selecting covariates. multiple regression is markedly improved by 
the inclusion of distance from the sea. the improvement being slightly 
greater if this variable is transformed to its logarithm. The best overall 
regression model is that using a month by month selection from the 
eighteen DTM variables, selection being by the model building criteria 
described under Methods, with independent selection in each month: Table 
2 above shows the significant monthly variables with the signs of their beta 
coefficients. This model shows an r value exceeding 0.9 (equivalent to a 
correlation coefficient of at least 0.95) in most months of the year. 

As would be expected, the pure multiple regression model noticeably 
diminishes in its predictive power if all eighteen DTM variables are included, 
with no attempt at selection (rl8f). The regression model is also reduced in 
predictive power if a single set of the eleven consistent variables (the eleven 
variables that are included in the model in at least eight months of the year) 
is used in all months, instead of the model being built individually for each 
month (rLL9). 
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Regression models in which all independent variables, or all variables, 
were transformed to logarithms or square roots were almost identical with 
the equivalent untransformed models, and will not be discussed further. 

The best overall fit is obtained from a mixed spline-regression model 
(sLLr9) using the nine DTM variables which had the greatest consistent 
performance in the regression model. That is, we have used the nine DTM 
variables (other than latitude and longitude) which are found from 
examination of Table 4 to have been selected for inclusion in the multiple 
regression model in at least nine months of the year. We have then fitted 
a thin plate spline with these nine variables as covariates and with latitude 
and longitude as the spline surface variables. This gives a set of models with 
even- month having an r 2 of over 0.9. A somewhat worse fit is obtained if 
all sixteen DTM variables are included as covariates (sLLrl6f). Building a 
separate mixed model for each month individually, using as covariates those 
DTM variables (excluding latitude and longitude) that were selected by the 
regression models for those individual months (so that for instance 
September used twelve covariates and October used only ten sLLrl6f), 
produced no alteration in the average accuracy of prediction compared with 
the use of the nine consistent variables, the individually selected models 
achieving first place by a small margin in a majority of months, but giving 
a notably poor prediction for June. 

According to a standard computation used in spline models, the 
accuracy of fit between the stations and the fitted spline surface has an r 2 
better than 0.97 in all months of the year; such figures are frequently quoted 
to demonstrate the accuracy of thin plate splines. However, this figure 



represents only the ability of the spline surface to conform to data points; 
it does not represent us ability to predict temperature in areas away from 



stations. 



Figure 2 shows the effect on the fit of the best model (the mixed 
model sLLr9 just described), averaged over the whole year, produced bv 
progressively removing stations from the grids of fitted and check stations. 
The stations have been removed in such a way that the most denselv 
clustered stations are removed first, until only fifteen percent remain. 
Removal of up to seventy percent of temperature stations has little impact 
on prediction; the average annual shows only a small decline from its 
maximum value of 0.925. This suggests that about two-thirds of the most 
densely packed stations are redundant. Collapse of the method is then 
rap.d. i hi. suggests that satisfactory temperature surfaces could be produced 
by the mixed spline-regression method using only thirty percent of the fitted 
station grid, provided the selection was made carefully to exclude the most 
densely clustered stations. As the cross-verification procedure splits the 
station grid into two (the fitted and the check stations), this means that in 
fact only a half of thirty percent of the total 206 stations, that is a mere 32 
well-spaced stations, are required for an adequate fit. Forty or fifty stations 
would perhaps give an adequate safety margin. 

This thinning procedure has used the same set of nine consistent 
covariates throughout. It is possible that were the multiple regression model 
rerun independently at each stage of the thinning procedure, it would 
produce a slightly different recommended set of covariates. Thus some of 
the decreases in predictive power at intermediate percentages of stations 



might be avoided; it is encouraging that at almost all stages of thinning 
far as seventy percent removal, the average annual r 2 remains above 0.9. 



as 



The relationship between the clustering index of stations (C - see 
above) and their individual residual deviations from the predicted surfaa- 
derived from the complete station set shows that, when the sign of the 
residual is taken into account, there is no relationship between clustering 
and the accuracy of prediction. The absolute value of the residual is 
significantly correlated with isolation in five months of the year: isolated 
stations are more accurately predicted. This effect probably occurs because 
a spline will conform more closely to isolated stations than to the individual 
stations in a dense cluster, where it can conform only to their average 
because of restrictions which are placed on the roughness of the spline 
surface. This might result in the method compromising its ability to predict 
temperatures at points remote from stations in areas where the stations are 
already sparse. We have shown that this is not the case - there is no 
significant relationship between the residual and the degree of isolation of 
the stations in any month; in other words, the method does not predict 
temperatures at points remote from the stations any better (or any worse) 
in areas with sparsely distributed stations. The spline thus conforms well 
to the best data available: the average value in a cluster of stations, and the 
individual values of isolated stations. There is no evidence that it sacrifices 
long distance accuracy at the cost of tight fits to closely spaced stations. 

As mixed spline-regression models provided most satisfactory 
predictions of temperature, we derived a series of models that used all the 
British climate recording stations for the reference period: that is the fitted 
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and check station sets combined (certain outlying stations are still omitted 
as detailed above). The station clusters were not thinned out. A separate 
model was derived for each month, using the same nine consistent DTM 
variables in the mixed spline-regression method sLLr9. 

Maps based on mixed spline-regression models were produced by 
using the model to fit a predicted value of the climatic variable to the suite 
of selected independent variables at each point on the 5 km grid. For most 
such points the variable ALT, which represents the altitude of the recording 
station, is meaningless, and it is therefore replaced for computation bv 
ALTME, the mean altitude of the square, multiplied by the beta-coefficient 
of ALT. 
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A series of temperature maps was computed using the model which 
produced the overall best predictions: the mixed spline-regression method 
with the nine consistent DTM variables (sLLr9 - Table 3). A map for 
January is plotted as Figure 3, to show the mean temperature for this 
month. The distributions for the four Manley quarters - for instance , 
"spring" of March, April and May - are almost identical with the respective 
cardinal months of January, April July and October. Figure 5 illustrates 
length of growing season, and Figure 6 continental^. We leave it to tin- 
reader's own judgement, what biological applications are suggested by these 
distributions, which in some cases appear to be the first to be published at 
real altitude (not corrected to sea level). 

Seasonal and annual maps were calculated as the average of the 
appropriate monthly maps, with the months weighted according to their 
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lengths. Growing season was calculated as the number of days when the 
average temperature exceeded the index value of 6* Celsius, using linear 
interpolation of the annual temperature curve between the means of the 
relevant adjacent months. Degree days may be computed as the integral of 
the linearly interpolated annual temperature curve multiplying days x daily 
temperature. Continental^ is computed as the difference between the mean 
temperatures of the warmest and coldest months, which are usually but not 
invariably July and January. Ail these statistics, that is growing season, 
degree days and continentality, were computed point by point on the 5 km 
grid. 



As a check on possible biases in the final temperature surfaces, we 
have plotted month by month the geographical distributions of the residual 
differences between the real station values and the surface. We have found 
that there is a high level of consistency between the patterns exhibited 
throughout the year. There are no consistent overall geographical trends in 
the distribution of the residuals (Figure 4); this indicates that accuracy is not 
for instance biased by latitude, as it might have been if dense clusters had 
fitted markedly more or less easily than isolated stations, or towards the 
south of England, which might have been expected from the possible bias 
of the estimates of the beta coefficients of the regression model toward the 
values found in the region with the most densely clustered stations. 

There is however some real information in the distribution of the 
largest residuals. Nearly all the large positive values (station warmer than 
predicted by model), and some of the weaker ones, are centred on urban 
areas. This effect is obviously explained by the well known formation of 



heat islands in towns and cities. Those familiar with the map of Britain will 
be able to locate London, Southampton and Portsmouth, Brighton, Oxford, 
Nottingham, Birmingham, Liverpool and Manchester, Sheffield, Glasgow, 
Edinburgh, and Dundee in this way. The placing of the recording stations 
on the edge of some of the urban areas has weakened their effect, notablv 
Manchester and Edinburgh, and the absence of a station in West Yorkshire 
(Leeds-Bradford) and Tvneside means that these conurbations cannot be 
detected. The heat island over London appears weak and fragmented 
apparently because the large number of stations within the urban area allows 
the predicted surface from the composite spline regression model (sLLr9) to 
rise consistently over this whole area (it is noteworthy that the comparable 
residuals from a pure DTM multiple regression model rlSs which lacks this 
flexibility, show a major heat island on London). One quite strong and 
persistent positive residual occurs west of Birmingham, in a region of small 
rural towns; possibly this represents also the effect of interpolation between 
the individual small heat islands. There is no obvious comparable 
explanation for the distribution of negative residuals (station cooler than 
predicted by model), which generally occupy much larger and more diffuse 
areas than the positive residuals: possibly some of the most extreme negative 
values centre on stations that are in major frost hollows; this is for instance 
likely for Kielder Castle, which produces a moderate but consistent negative 
residual year-round in the Anglo-Scottish border, and which lies at the 
confluence of several steep narrow valleys. The model might therefore be 
improved by using even finer scale topographic data to investigate 
mesoclimatic effects in the vicinity of the recorder; clearly it could be 
improved by introducing an index of urbanisation as a covariate. 



The largest residuals are less than one half of one degree, except for 
seven values which lie between C.5 and 0.8 of a degree: this is consistent 
w.th the usual accuracy of thin plate splines when plotting temperature 
surfaces. Some of the effects on the accuracy of prediction of removing 
clustered stations (above) may indicate >n addition or instead, the effect of 
removing urban stations, as there is a particularly large cluster in the 
London area. 



The best fit of all was obtained by using a suite of DTM variables 
previously identified by multiple regression as having a significant 
contribution to the regression model. In this particular case, the year-round 
fit was about equally as good when the variables were selected bv an 
independent model for each month, or when the nine variables that had the 
most consistent appearance over the whole year were used in all months. 
Although in this study we have used the slightly better year-round model 
for our final maps, we anticipate that in most applications the better result 
will be obtained from individual monthly models. 

The predicted surface using the mixed regression-splme method i s 
shown by cross-validation to have a better than ninety five percent 
correlation with reality in all months of the year. 

Our maps of temperature (e.g. Figure 3) are among the first to be 
produced for Great Britain, in particular in attempting to plot at real altitude 
rather than reduced to sea level: our map of the length of the arowin* 
season appears to be the first such to cover the whole of Great Britain 
(previous attempts covered England and Wales only). Cross-validation 



suggests that these maps are always better than 95 percent correlated with 
reality. It is likely that there >s some geographical variation in accuracy 
although i, is difficult to say what pattern this might occur in. given that the 
maps ot the residuals for temperature show no obvious trends apart front 
the efiec, of urbanisation. We suggest that clearly an index of urbanisauon 
would improve temperature prediction. Examination of the DTM variables 
by graphical methods, and selective transformation to logarithmic and other 
scaling* might produce some further improvement. The method ve have 
emploved here is clearly applicable to other meteorological data, minimum 
and maximum temperatures, either monthly or daily, would be easv , 0 
analyse m this way. A particularly desirable extension would be an analvsis 
of rainfall patterns. 

The maps have direct application to ecogeographical studies of data 
gathered predominantly in the period 1941-70. such as the A,U S of, he Briti* 
Ftor., (Perring ar. i Takers 1962). However during the century up to the 
present in winch climate data have been intensively studied it has been 
established that thiny-vear averages provide quite accurate predictions for 
mean climate in subsequent periods. Our maps should therefore prove to 
be adequate for ecological studies that use data gathered on either side of this 
period. 

The predictions in the above example are of a mesoclimate; detailed 
predictions of microclimate could require different considerations of local 
conditions, for instance in the prediction of frost-hollows or of the effects 
of local north and south slopes. Our predictions relate to the average 
altitudes of national grid squares of 5 km side: further local correction (using 



PCT/GB96/00700 



-28 

fo 



r instance the standard adiabatic lapse rate) would be needed to predict the 
temperature of any point within such a square that deviated considerably 
from the average altitude. 

Potential applications of climate surfaces of this type are clearly 
diverse, from studies of the limitation of individual species by climatic 
factors to studies of species richness. Our DTM mixed spline-regression 
model for temperature, gridded by interpolation to a resolution of 2 km, 
will be applied to the changing distribution of the British bird fauna and to 
explaining the distribution of bird diversity in the British winter. The 
general method used, fitting a thin plate spline surface with variables, 
derived from a DTM and selected by multiple regression techniques, as 
covariates, is likely to have wide applications in biological meteorology, 
general meteorology, ecology and geography; clearly it may be applicable to 
a wide variety of meteorological and other environmentally significant 
statistics. 

For ecological purposes, mesoclimatic conditions of temperature at 
points distant from climate recording stations in Britain can be predicted 
with an accuracy of better than 95 per cent. The preferred method for 
plotting monthly temperature surfaces for Great Britain is a thin plate spline 
fitting the variables latitude and longitude, with the entry of selected DTM 
variables as covariates. Selection of the DTM variables is by multiple 
regression, using model-building criteria. 

Although the above-described examples of the invention produce 
maps which may be, for example, either displayed on a visual display unit 
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(e.g. a monitor screen) or pnnted onto a print medium, other embodiments 
of the invention may alternatively produce data from which a map or table 
may be displayed or printed. For example, apparatus in accordance with 
certam embodiments of the invention may produce a file of electronic data 
that may be stored, e.g. in the form of a look-up table, the data representing, 
for example, temperature values that have been predicted by a method as 
described in the foregoing. 

Figure 7 is a block schematic diagram of apparatus for drawing a map, 
which may operate in accordance with a method as described above. 

The apparatus of Figure 7 comprises a main processor 100 for 
calculating a local climate variable, by predicting values of temperature at a 
plurality of points in the spatial area of interest (for example, the United 
Kingdom), using data input from a DTM data store 50, via an output device 
110, and an equation generated by a processor 70 and output via an output 
device 90. The predicted values as calculated by the main processor 100 are 
fed to a printer 120, which outputs a printed map 130. For example, the 
printed map 130 may have the form shown in any of Figures 3 to 6. 

The processor 70 calculates a general climatic model from climate data 
that is received from a climate data store 20 via an output device 30, and 
DTM data values that are output from a DTM data store 50 via an output 
device 60. A processor 80 receives both climate data from the store 20 and 
DTM data from the store 50. and selects therefrom covariates for the mixed 
spline-regression model that is calculated in the processor 70. The climate 
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data store 20 receives climate data from an input device 10, and the DTM 
data store 50 receives DTM data from an input device 40. 

The apparatus of Figure 7 may be arranged to perform any of the 
methods described above, as examples of the present invention. 

The reader's attention is directed to all papers and documents which 
are filed concurrently with or previous to this specification in connection 
with this application and which are open to public inspection with this 
specification, and the contents of all such papers and documents are 
incorporated herein by reference. 

All of the features disclosed in this specification (including any 
accompanying claims, abstract and drawings), and/or all of the steps of any 
method or process so disclosed, may be combined in any combination, 
except combinations where at least some of such features and/or steps are 
mutually exclusive. 

Each feature disclosed in this specification (including any 
accompanying claims, abstract and drawings), may be replaced by alternative 
features serving the same, equivalent or similar purpose, unless expressly 
stated otherwise. Thus, unless expressly stated otherwise, each feature 
disclosed is one example only of a generic series of equivalent or similar 
features. 



The invention is not restricted to the details of the foregoing 
embodiment(s). The invention extends to any novel one, or any novel 
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combination, of the features disclosed in this specification (including any 
accompanymg claims, abstract and drawings), or to any novel one, or any- 
novel combination, of the steps of any method or process so disclosed. 



CLAIMS: 



-32- 



1. Apparatus for drawing a map of a spatial area, comprising: 

means for recording a set of first data values, each measured at a 
respective one of a plurality of predetermined points in said area; 

means for recording a plurality of sets of further data values, each data 
value pertaining to a respective one of said predetermined points; 

means for fitting a mixed spline-regression model to the set of first 
data values, using as a spline variable at least one set of a plurality of sets of 
further data values that have been selected by a multiple regression analysis 
as predictive of said set of first data values, and using the others of said 
selected sets of further values a" covariates; 

means for predicting, from said model, values of said first data at a 
plurality of points in said spatial area; and 



mean; 



s, using said predicted values, for drawing a map of the spatial 
area, in which said predicted values are depicted. 

2. A method of drawing a map of a spatial area, comprising the steps of: 

(a) recording a set of first data values, each measured at a respective 
one of a plurality of predetermined points in said area; 
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(b) recording a plurality of sets of further data values, each data value 
pertaining to a respective one of said predetermined points; 

(c) using those of said sets of further data values that have been 
selected by a multiple regression analysis as predictive of said set of first data 
values; 



(d) fitting a mixed spline-regression model to the set of first data 
alues, using at least one of said selected sets of further values as a spline 
variable, and using the others of said selected sets of further values as 



V 



covanates; 



(e) predicting, from said model, values of said first data at a plurality 
of points in said spatial area; and 

(f) using said predicted values to draw a map of the spatial area, in 
which said predicted values are depicted. 

3. A method according to claim 1 or 2, including a step (cl) of 
performing a multiple regression analysis on all of said sets of data values 
thereby to select those of said plurality of sets of further data values as 
predictive of said set of first data values. 

4. A method according to claim 3, wherein, in said step (cl) of 
performing a multiple regression analysis, at least one of said sets of further 
data values is rejected as non-predictive of said set of first data values. 
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5. 



A method according to claim 2, 3 or 4, wherein two of said sets of 
further data values comprise longitude and latitude values respectively and, 
in said step (d) of fitting a mixed spline-regression model to the set of first 
data values, only said latitude and longitude values are selected as spline 
variables. 

6. A method according to any of claims 2 to 5, wherein at least one of 
said sets of further data values is selected from the group comprising: 

the altitude of each of said predetermined points above sea level; 

the shortest distance from each of said predetermined points to the 
sea; 

the maximum altitude to the east of each of said predetermined points 
in a ± 25 km north-south band; and 

the variables listed in Table 1 above. 

7. A method according to any of claims 2 to 6, wherein the values of at 
least one of said sets of further values is measured or derived from a Digital 
Terrain Model (DTM). 

8. A method according to any of claims 2 to 7, wherein said first data 
values are recorded over a first predetermined time period, and said map is 
drawn to depict the values of said first data in a second time period. 
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9. A method according to claim 8, wherein said first predetermined time 
period comprises at least one full calendar year. 

10. A method according to claim 8 or 9, wherein said second time period 
represents a predetermined time of year. 



11. A method of drawing a map of a spatial area, substantially as 
hereinbefore described with reference to the accompanying drawings. ' 

12. Apparatus for drawing a map of a spatial area, adapted to perform a 
method according to any of claims 2 to 11. 

13. Apparatus for drawing a map of a spatial area, substantially as 
hereinbefore described with reference to the accompanying drawings. 

14. Apparatus or a method for drawing a map of a spatial area, according 
to any of the preceding claims, wherein said first values represent 
temperature. 

15. Apparatus or a method for drawing a map of a spatial area, according 
to any of the preceding claims, in combination with any or all of the 
features disclosed in this specification, including the accompanying abstract 

and drawings. 
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Figure 5 

Growing Season 

as days of year above 6 celsius 
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Figure 6 
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